Graphical Analysis using Stata

Dimitrios Karamanis
University of Piraeus
dkaramanis@hotmail.com

In [1]:
%matplotlib inline
from ipystata.config import config_stata
from IPython.display import Image
from IPython.core.display import HTML 
config_stata('C:\Program Files\Stata14\StataSE-64.exe')

ITEMS:

  • Importance of data presentation

  • Types of Graphs

  • Misleading Graphs

  • Dos and Donts

  • Simple Line Graph

  • Multiple Dependent Variables

  • Scatter plot

  • Scatterplot matrix

  • Scatterplot with weighted markers

  • Dot Chart

  • Horizontal bar chart graphed over another variable, sorted

  • Box plot by values of categorical variable

  • Stack Graph

  • Scatter & Bar plot

  • Scatterplot with weighted markers and classes

  • Two axis line graph

  • Radar Graph

  • Overlaying Two-Way Plot Types

  • Adding a Title and Removing the Legend

  • Showing Confidence Intervals, Labelling Axes, Modifyng Legend

  • Marker Labels and Subtitles

  • Position of Marker Labels

  • Position of Marker Labels and Legend Display

  • Marker Size and Symbol, Line Color

  • Marker and Marker Label Color, Line Style

  • By-Graph: Separate Graphs for Each Subset of Data

  • By-Graph Options

  • Axis Scale, Ticks and Labels

  • Storing Graphs in Memory

  • Combining Graphs

    Appearance of graph defined by graph elements:

  • data - marker symbols, lines
  • elements within plot region – text, marker labels, line labels
  • elements outside plot region – titles, legend, notes, axis labels, tick marks, axis titles
  • size and shape of plot region and entire grap

  • LINK:

    Visual overview of creating graphs in Stata

    Importance of data presentation


    Let's see some examples:

    During WWII, the Navy tried to determine where they needed to armor their aircraft to ensure they came back home. They ran an analysis of where planes had been shot up, and came up with this. Obviously the places that needed to be up-armored are the wingtips, the central body, and the elevators. That’s where the planes were all getting shot up. Abraham Wald, a statistician, disagreed. He thought they should better armor the nose area, engines, and mid-body. Which was crazy, of course. That’s not where the planes were getting shot. Except Mr. Wald realized what the others didn’t. The planes were getting shot there too, but they weren’t making it home. What the Navy thought it had done was analyze where aircraft were suffering the most damage. What they had actually done was analyze where aircraft could suffer the most damage without catastrophic failure. All of the places that weren’t hit? Those planes had been shot there and crashed. They weren’t looking at the whole sample set, only the survivors.

    In [2]:
    Image(filename='airplane.PNG')
    
    Out[2]:
    more examples:
    In [3]:
    Image(filename='cholera.PNG')
    
    Out[3]:

    Types of graphs


    There are several types of graphs, here we will present some of them.

    In [4]:
    Image(filename='typesgraphs.PNG')
    
    Out[4]:

    Misleading Graphs


    The “classic” types of misleading graphs include cases where:

  • The Vertical scale is too big or too small, or skips numbers, or doesn't start at zero (axis manipulation)

  • The graph isn't labeled properly.

  • Data is left out.
  • Wrong Graphs
  • Against Conventions

    Source

  • In [5]:
    Image(filename='MisleadingGraphs/1.JPG')
    
    Out[5]:
    In [6]:
    Image(filename='MisleadingGraphs/2.JPG')
    
    Out[6]:
    In [7]:
    Image(filename='MisleadingGraphs/3.JPG')
    
    Out[7]:
    In [8]:
    Image(filename='MisleadingGraphs/4.JPG')
    
    Out[8]:
    In [9]:
    Image(filename='MisleadingGraphs/5.JPG')
    
    Out[9]:

    We will use several datasets


    In [10]:
    %%stata
    sysuse uslifeexp.dta, clear
    describe
    summarize
    
    (U.S. life expectancy, 1900-1999)
    
    Contains data from C:\Program Files (x86)\Stata14\ado\base/u/uslifeexp.dta
      obs:           100                          U.S. life expectancy, 1900-1999
     vars:            10                          30 Mar 2014 04:31
     size:         3,800                          (_dta has notes)
    --------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------
    year            int     %9.0g                 Year
    le              float   %9.0g                 life expectancy
    le_male         float   %9.0g                 Life expectancy, males
    le_female       float   %9.0g                 Life expectancy, females
    le_w            float   %9.0g                 Life expectancy, whites
    le_wmale        float   %9.0g                 Life expectancy, white males
    le_wfemale      float   %9.0g                 Life expectancy, white females
    le_b            float   %9.0g                 Life expectancy, blacks
    le_bmale        float   %9.0g                 Life expectancy, black males
    le_bfemale      float   %9.0g                 Life expectancy, black females
    --------------------------------------------------------------------------------
    Sorted by: year
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
            year |        100      1949.5    29.01149       1900       1999
              le |        100      64.829    9.158628       39.1       76.7
         le_male |        100      62.302    8.436369       36.6       73.9
       le_female |        100       67.51    9.834987       42.2       79.5
            le_w |        100      65.688    9.171269       39.8       77.3
    -------------+---------------------------------------------------------
        le_wmale |        100      63.143    8.503954       37.1       74.6
      le_wfemale |        100      68.434    9.797167       43.2         80
            le_b |        100      56.033    12.48937       30.8       71.4
        le_bmale |        100      53.589     11.4569       29.1       67.8
      le_bfemale |        100      58.567     13.5409       32.5       74.8
    
    
    In [11]:
    %%stata
    sysuse lifeexp.dta, clear
    describe
    summarize
    
    ## see how many different countries do we have
    codebook(country)
    #we have 68 observations, 6 variables
    
    (Life expectancy, 1998)
    
    Contains data from C:\Program Files (x86)\Stata14\ado\base/l/lifeexp.dta
      obs:            68                          Life expectancy, 1998
     vars:             6                          26 Mar 2014 09:40
     size:         2,652                          (_dta has notes)
    --------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------
    region          byte    %12.0g     region     Region
    country         str28   %28s                  Country
    popgrowth       float   %9.0g               * Avg. annual % growth
    lexp            byte    %9.0g               * Life expectancy at birth
    gnppc           float   %9.0g               * GNP per capita
    safewater       byte    %9.0g               * 
                                                * indicated variables have notes
    --------------------------------------------------------------------------------
    Sorted by: 
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
          region |         68         1.5    .7431277          1          3
         country |          0
       popgrowth |         68    .9720588    .9311918        -.5          3
            lexp |         68    72.27941    4.715315         54         79
           gnppc |         63    8674.857    10634.68        370      39980
    -------------+---------------------------------------------------------
       safewater |         40        76.1    17.89112         28        100
    
    Unknown #command
    
    --------------------------------------------------------------------------------
    country                                                                  Country
    --------------------------------------------------------------------------------
    
                      type:  string (str28)
    
             unique values:  68                       missing "":  0/68
    
                  examples:  "Chile"
                             "Greece"
                             "Mexico"
                             "Slovenia"
    
                   warning:  variable has embedded blanks
    
    Unknown #command
    
    

    Simple Line Graph

    See three different ways

    In [12]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    graph twoway line le year
    
    (U.S. life expectancy, 1900-1999)
    
    
    In [13]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    twoway line le year
    
    (U.S. life expectancy, 1900-1999)
    
    
    In [14]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    line le year
    
    (U.S. life expectancy, 1900-1999)
    
    
    In [15]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    twoway line le year, scheme(s1mono)
    
    (U.S. life expectancy, 1900-1999)
    
    
    In [16]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    twoway line le year,  scheme(economist)
    
    (U.S. life expectancy, 1900-1999)
    
    

    Multiple Dependent Variables

    In [17]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    twoway line le_wmale le_wfemale le_bmale le_bfemale year
    
    (U.S. life expectancy, 1900-1999)
    
    
    In [18]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    twoway line le_wmale le_wfemale le_bmale le_bfemale year  ///
    , text(32 1920 "{bf:1918} {it:Influenza} Pandemic", place(3))
    
    (U.S. life expectancy, 1900-1999)
    
    . twoway line le_wmale le_wfemale le_bmale le_bfemale year , text(32 1920 "{bf:1
    
    

    Scatter Plot

    In [19]:
    %%stata
    sysuse lifeexp.dta, clear
    
    scatter lexp safewater
    
    (Life expectancy, 1998)
    
    

    Scatterplot matrix

    In [20]:
    %%stata
    sysuse lifeexp.dta, clear
    
    graph matrix lexp safewater gnppc popgrowth
    
    (Life expectancy, 1998)
    
    

    Scatterplot with weighted markers

    In [21]:
    %%stata
    sysuse lifeexp.dta, clear
    
    twoway scatter  lexp safewater  [w= gnppc] if region==2  /* North America */ /// 
        & gnppc ~=., msymbol(circle_hollow) || scatter lexp safewater  if region==2 & gnppc ~=., ///
        msymbol(none)  mlabel(country)  mlabposition(0) legend(off) ///
        note("Note: Area of symbol proportional to country's GNP per capita")
    
    (Life expectancy, 1998)
    
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    
    

    Dot chart

    In [22]:
    %%stata
    sysuse lifeexp.dta, clear
    
    graph dot gnppc if region==2 , over(country) ytitle(GNP per capita in $)
    
    (Life expectancy, 1998)
    
    

    Horizontal bar chart graphed over another variable, sorted

    In [23]:
    %%stata
    sysuse lifeexp.dta, clear
    
    graph hbar  lexp if region==2 , over(country  , sort(1) ///
     descending label(labsize(small))) ///
     title("Life expectancy at birth" ,size(medium)) ///
     ytitle("Age")
    
    (Life expectancy, 1998)
    
    

    Box plot by values of categorical variable

    In [24]:
    %%stata
    sysuse lifeexp.dta, clear
    
    graph box gnppc , over( region )
    
    (Life expectancy, 1998)
    
    

    Stack Graph

    In [25]:
    %%stata
    cd C:\Users\Dimitris\Documents
    use data.dta, clear
    bysort country: sum subjective
    
    C:\Users\Dimitris\Documents
    
    --------------------------------------------------------------------------------
    -> country = AUT
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     10,699    4.030283    .8634683          1          5
    
    --------------------------------------------------------------------------------
    -> country = BEL
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     14,338    3.943437    .7998015          1          5
    
    --------------------------------------------------------------------------------
    -> country = CHE
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     13,856    4.119948    .7762556          1          5
    
    --------------------------------------------------------------------------------
    -> country = CZE
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     15,151     3.70893    .9562848          1          5
    
    --------------------------------------------------------------------------------
    -> country = DEU
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     23,313     3.64316    .8921426          1          5
    
    --------------------------------------------------------------------------------
    -> country = DNK
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     10,810    4.084366    .9092944          1          5
    
    --------------------------------------------------------------------------------
    -> country = ESP
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     15,496    3.693082    .9245251          1          5
    
    --------------------------------------------------------------------------------
    -> country = EST
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     13,398     3.42275    .8867981          1          5
    
    --------------------------------------------------------------------------------
    -> country = FIN
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     16,190    3.814021    .8298666          1          5
    
    --------------------------------------------------------------------------------
    -> country = FRA
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     15,043    3.713156    .8896451          1          5
    
    --------------------------------------------------------------------------------
    -> country = GBR
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     17,608    3.899307    .9491673          1          5
    
    --------------------------------------------------------------------------------
    -> country = GRC
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |      9,758    4.123796    .9500411          1          5
    
    --------------------------------------------------------------------------------
    -> country = HUN
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     13,123    3.452793    .9806661          1          5
    
    --------------------------------------------------------------------------------
    -> country = IRL
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     18,236    4.170432    .8148706          1          5
    
    --------------------------------------------------------------------------------
    -> country = ITA
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |      4,778    3.804102    .8538403          1          5
    
    --------------------------------------------------------------------------------
    -> country = NLD
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     15,180    3.817721    .7705561          1          5
    
    --------------------------------------------------------------------------------
    -> country = NOR
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     13,244    4.033223     .876044          1          5
    
    --------------------------------------------------------------------------------
    -> country = POL
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     14,099    3.629903     .927383          1          5
    
    --------------------------------------------------------------------------------
    -> country = PRT
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     14,980    3.421362    .9088348          1          5
    
    --------------------------------------------------------------------------------
    -> country = SVK
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |      8,772    3.602029    .9213566          1          5
    
    --------------------------------------------------------------------------------
    -> country = SVN
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     10,906    3.614157     .922905          1          5
    
    --------------------------------------------------------------------------------
    -> country = SWE
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
      subjective |     14,373    4.026021    .8554059          1          5
    
    
    In [26]:
    %%stata
    sort country
    by country: egen subjective11=count(subjective) if subjective==1
    by country: egen subjective22=count(subjective) if subjective==2
    by country: egen subjective33=count(subjective) if subjective==3
    by country: egen subjective44=count(subjective) if subjective==4
    by country: egen subjective55=count(subjective) if subjective==5
    
    graph bar subjective11 subjective22 subjective33 subjective44 subjective55,  percentage over(country, label(angle(90))) ///
     stack title("Subjective general health in EU countries") ytitle("%") ///
     legend( label(1 "Very bad") label(2 "Bad") label(3 "Fair")  label(4 "Good") label(5 "Very good") cols(5)  symxsize(10) ) ///
     bar(1,color(purple)) bar(2,color(red)) bar(3,color(orange)) bar(4,color(ebblue)) bar(5,color(green))
     
    
    (299484 missing values generated)
    
    (283848 missing values generated)
    
    (225945 missing values generated)
    
    (172509 missing values generated)
    
    (233263 missing values generated)
    
    

    Scatter & Bar Plot

    In [27]:
    %%stata
    use C:\Users\Dimitris\Documents\data.dta, clear
    collapse (mean) subjective imm_stock imm_inflows , by(country)
    encode country,gen(id)
     
    **SCATTER & BAR
    twoway (bar imm_stock id ) (scatter imm_inflows id, mfcolor(red) ytitle(% of population) ///
     title(Stock & Inflows of foreigners per country, size(medium)) ///
     xtitle("") xlabel(1(1)22, labsize(small) angle(vertical) valuelabel) ///
     legend(ring(0) pos(2) order(1 "Stock" 2 "Inflows ") cols(1))  note(Source: OECD International Migration Database) )
    

    Radar Graph

    In [28]:
    %%stata
    /*first install radar by typing: ssc install radar*/
    
    sysuse auto.dta, clear
    
    radar make turn mpg trunk if foreign, title(Radar graph) ///
    lc(red blue green) lw(*1 *2 *4) r(0 12 14 18 50) 
    
    (1978 Automobile Data)
    
    (52 observations deleted)
    
    

    Scatterplot with weighted markers and classes

    In [29]:
    %%stata
    use data2.dta, clear
    gen country_code1=country_code if Open_gate_wall == "Gate"          
    gen country_code3=country_code if Open_gate_wall == "Wall" 
    gen country_code_GRC=country_code if country_code=="GRC"
    gen country_code_high=country_code if RD>3 
    gen country_code_highFD=country_code if FD>0.9 
    
    separate RD, by(Open_gate_wall) 
    separate lngdp_pc , by(Open_gate_wall) 
    separate FD, by(Open_gate_wall)  
    separate ka, by(Open_gate_wall)  
    
    scatter RD lngdp_pc, msymbol(none) ms(i) mlabpos(c) mlabel( country_code3) mlabcolor(red) ///
    	|| scatter RD lngdp_pc, msymbol(none) ms(i) mlabpos(3) mlabel( country_code_high) mlabcolor(green)  ///
    	|| scatter RD lngdp_pc, msymbol(none) ms(i) mlabpos(c) mlabel( country_code_GRC) mlabcolor(green) /// 
    	|| scatter RD1 lngdp_pc [w=ka], ms(oh) mcol(gs12) msymbol(circle_hollow)  mlcolor(blue) ytitle("R&D %GDP")  xtitle("GDP per capita")  legend(on)  ///
    	title("R&D,GDP per Capita and Capital Controls") || scatter RD2 lngdp_pc [w=ka], ms(oh) mcol(gs12) msymbol(circle_hollow) mlcolor(green)  legend(on) ///
    	|| scatter RD3 lngdp_pc [w=ka], ms(oh) mcol(gs12) msymbol(circle_hollow) mlcolor(red) note("Note: Area of symbol proportional to country's Capital Control level") ///
    	legend( ring(0) pos(2) col(1) order(6 4 5)  label(4 "Gate") label(5 "Open") label(6 "Wall"))
    
    (29 missing values generated)
    
    (48 missing values generated)
    
    (53 missing values generated)
    
    (51 missing values generated)
    
    (53 missing values generated)
    
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------
    RD1             double  %10.0g                RD, Open_gate_wall == Gate
    RD2             double  %10.0g                RD, Open_gate_wall == Open
    RD3             double  %10.0g                RD, Open_gate_wall == Wall
    
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------
    lngdp_pc1       float   %9.0g                 lngdp_pc, Open_gate_wall == Gate
    lngdp_pc2       float   %9.0g                 lngdp_pc, Open_gate_wall == Open
    lngdp_pc3       float   %9.0g                 lngdp_pc, Open_gate_wall == Wall
    
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------
    FD1             double  %10.0g                FD, Open_gate_wall == Gate
    FD2             double  %10.0g                FD, Open_gate_wall == Open
    FD3             double  %10.0g                FD, Open_gate_wall == Wall
    
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------
    ka1             double  %9.0g                 ka, Open_gate_wall == Gate
    ka2             double  %9.0g                 ka, Open_gate_wall == Open
    ka3             double  %9.0g                 ka, Open_gate_wall == Wall
    
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    (analytic weights assumed)
    
    

    Two-axis line graph (use carefully when different scaling)

    In [30]:
    %%stata
    use data3.dta, clear
    sort year
    by year:  egen mean_RD=mean(RD)
    by year: egen mean_ka=mean(ka)
    label variable mean_RD "R&D %GDP"
    label variable mean_ka "Capital Controls"
    
    twoway connected mean_ka year ,  msymbol(diamond) title("R&D as %GDP and Capital Controls through years", size(medium))   xtitle("year") yscale(alt  axis(1)) ///    
     || connected mean_RD year,  yaxis(2) yscale(alt axis(2))  legend( ring(0) pos(5) col(1) label(2 "R&D %GDP") label(1 "Capital Controls") size(small)) ///
     xtitle("Year") xsc(r(1996 2013)) xlabel(1996(2)2013 ,labsize(small)) note("NOTE: Capital controls range from 0 (open) to 1 (close)" ,justification(left) box)
    

    Below exactly the same data!!!

    In [31]:
    %%stata
    use data3.dta, clear
    sort year
    by year:  egen mean_RD=mean(RD)
    by year: egen mean_ka=mean(ka)
    label variable mean_RD "R&D %GDP"
    label variable mean_ka "Capital Controls"
    
    line mean_ka mean_RD year ,  title("R&D as %GDP and Capital Controls through years", size(medium)) ///
     xtitle("Year") legend(label(2 "R&D %GDP") label(1 "Capital Controls") ring(0) pos(5) col(1)) ///
     xsc(r(1996 2013)) xlabel(1996(2)2013 ,labsize(small))
    

    Overlaying Two-Way Plot Types

    In [32]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    scatter le_male le_female year if year >= 1950 ///
    || lfit le_male   year if year >= 1950 ///
    || lfit le_female year if year >= 1950
    
    (U.S. life expectancy, 1900-1999)
    
    

    Overlaying Two-Way Plot Types - Adding a Title

    In [33]:
    %%stata
    sysuse uslifeexp.dta, clear
    
    scatter le_male le_female year if year >= 1950 ///
    || lfit le_male   year if year >= 1950 ///
    || lfit le_female year if year >= 1950 ///
    ,title("US Male and Female Life Expectancy, 1950-2000") ///
     text(75 1978 "Female", place(3)) ///
     text(68 1978 "Male", place(3))
     
    
    (U.S. life expectancy, 1900-1999)
    
    

    Showing Confidence Intervals, Labelling Axes, Modifying Legend

    In [34]:
    %%stata
    sysuse lifeexp.dta, clear
    
    twoway ///
     (lfitci  lexp safewater if region == 2) /* North America */ ///
     (scatter lexp safewater if region == 2) ///
    ,title("Life expectancy at birth by access to safe water, 1998")  ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(ring(0) pos(5) order(2 "Linear fit" 1 "95% CI"))
    
    (Life expectancy, 1998)
    
    

    Marker Labels and Subtitles

    In [35]:
    %%stata
    sysuse lifeexp.dta, clear
    
    twoway ///
     (lfitci  lexp safewater if region == 2) /* North America */ ///
     (scatter lexp safewater if region == 2, mlabel(country)) ///
    ,title("Life expectancy at birth by access to safe water, 1998") ///
     subtitle("North America")  ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(ring(0) pos(5) order(2 "Linear fit" 1 "95% CI"))
    
    (Life expectancy, 1998)
    
    

    Position of Marker Labels

    In [36]:
    %%stata
    sysuse lifeexp.dta, clear
    
    generate pos = 12 if country == "Panama"
    replace  pos = 12 if country == "Honduras"
    replace  pos = 10 if country == "Cuba"
    replace  pos =  9 if country == "Jamaica"
    replace  pos =  9 if country == "El Salvador"
    replace  pos =  9 if country == "Trinidad and Tobago"
    replace  pos =  9 if country == "Dominican Republic"
    
    twoway ///
     (lfitci  lexp safewater if region == 2) /* North America */ ///
     (scatter lexp safewater if region == 2 ///
      , mlabel(country) mlabvposition(pos)) ///
    ,title("Life expectancy at birth by access to safe water, 1998") ///
     subtitle("North America")  ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(ring(0) pos(5) order(2 "Linear fit" 1 "95% CI")) ///
     plotregion(margin(r+10))
    
    (Life expectancy, 1998)
    
    (67 missing values generated)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    

    Position of Marker Labels

    In [37]:
    %%stata
    sysuse lifeexp.dta, clear
    
    twoway (scatter lexp safewater if region == 2 | region == 3 ///
      ,mlabel(country)) ///
     ,title("Life expectancy at birth by access to safe water, 1998") ///
     subtitle("North and South America")  ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     plotregion(margin(r+10))
    
    (Life expectancy, 1998)
    
    

    Position of Marker Labels and Legend Display

    In [38]:
    %%stata
    sysuse lifeexp.dta, clear
    
    generate pos =  3
    replace  pos =  9 if country == "Argentina"
    replace  pos =  9 if country == "Canada"
    replace  pos =  9 if country == "Cuba"
    replace  pos =  9 if country == "Panama"
    replace  pos =  9 if country == "Venezuela"
    replace  pos =  9 if country == "Jamaica"
    replace  pos =  9 if country == "Dominican Republic"
    replace  pos =  9 if country == "Ecuador"
    replace  pos =  9 if country == "El Salvador"
    replace  pos = 12 if country == "Puerto Rico"
    
    twoway ///
     (scatter lexp safewater if region == 2 ///
      ,mlabel(country) mlabvposition(pos)) ///
     (scatter lexp safewater if region == 3 ///
      ,mlabel(country) mlabvposition(pos)) ///
    ,title("Life expectancy at birth by access to safe water, 1998") ///
     subtitle("North and South America")  ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(ring(0) pos(5) order(1 "North America" 2 "South America") cols(1))
    
    (Life expectancy, 1998)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    

    Marker Size and Symbol, Line Color

    In [39]:
    %%stata
    sysuse lifeexp.dta, clear
    
    generate pos =  3
    replace  pos =  9 if country == "Argentina"
    replace  pos =  9 if country == "Canada"
    replace  pos =  9 if country == "Cuba"
    replace  pos =  9 if country == "Panama"
    replace  pos =  9 if country == "Venezuela"
    replace  pos =  9 if country == "Jamaica"
    replace  pos =  9 if country == "Dominican Republic"
    replace  pos =  9 if country == "Ecuador"
    replace  pos =  9 if country == "El Salvador"
    replace  pos = 12 if country == "Puerto Rico"
    
    twoway ///
     (scatter lexp safewater if region == 2 ///
      ,mlabel(country) mlabvposition(pos) msize(small)) ///
     (scatter lexp safewater if region == 3 ///
      ,mlabel(country) mlabvposition(pos) msize(small) msymbol(circle_hollow)) ///
    (lfit lexp safewater if region == 2, clcolor(navy)) ///
    (lfit lexp safewater if region == 3, clcolor(maroon)) ///
    ,title("Life expectancy at birth by access to safe water, 1998") ///
     subtitle("North and South America")   ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(ring(0) pos(5) cols(1) order(1 "North America" 2 "South America"  ///
       3 "North America linear fit" 4 "South America linear fit"))
    
    (Life expectancy, 1998)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    

    Marker and Marker Label Color, Line Style

    In [40]:
    %%stata
    sysuse lifeexp.dta, clear
    
    generate pos =  3
    replace  pos =  9 if country == "Argentina"
    replace  pos =  9 if country == "Canada"
    replace  pos =  9 if country == "Cuba"
    replace  pos =  9 if country == "Panama"
    replace  pos =  9 if country == "Venezuela"
    replace  pos =  9 if country == "Jamaica"
    replace  pos =  9 if country == "Dominican Republic"
    replace  pos =  9 if country == "Ecuador"
    replace  pos =  9 if country == "El Salvador"
    replace  pos = 12 if country == "Puerto Rico"
    #delimit ;
    twoway ///
     (scatter lexp safewater if region == 2 ///
      ,mlabel(country) mlabvposition(pos) msize(small) mcolor(black) mlabcolor(black)) ///
     (scatter lexp safewater if region == 3 ///
      ,mlabel(country) mlabvposition(pos) msize(small) mcolor(black) mlabcolor(black) ///
       msymbol(circle_hollow)) ///
    (lfit lexp safewater if region == 2, clcolor(black)) ///
    (lfit lexp safewater if region == 3, clcolor(black) clpattern(dash)) ///
    ,title("Life expectancy at birth by access to safe water, 1998", color(black)) ///
     subtitle("North and South America") /// 
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(ring(0) pos(5) cols(1) order(1 "North America" 2 "South America" /// 
       3 "North America linear fit" 4 "South America linear fit"))
    
    (Life expectancy, 1998)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    Unknown #command
    
    

    By-Graph: Separate Graphs for Each Subset of Data

    In [41]:
    %%stata
    sysuse lifeexp.dta, clear
    
    twoway scatter lexp safewater, by(region, total) ///
    ,ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water")
    
    (Life expectancy, 1998)
    
    

    By-Graph Options

    In [42]:
    %%stata
    sysuse lifeexp.dta, clear
    
    twoway scatter lexp safewater ///
    ,by(region,total style(compact) ///
        title("Life expectancy by access to safe water") note("")) ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water")
    
    (Life expectancy, 1998)
    
    

    Axis Scale, Ticks and Labels

    In [43]:
    %%stata
    sysuse lifeexp.dta, clear
    
    twoway scatter lexp safewater ///
    , by(region,total style(compact) ///
         title("Life expectancy by access to safe water") note("")) ///
     xscale(range(20 100)) ///
     xtick(20(10)100) ///
     xlabel(30(10)100, labsize(small)) ///
     xtitle("Percent of population with access to safe water") ///
     ytitle("Life expectancy at birth") ///
     ylabel(55(5)80, angle(0))
    
    (Life expectancy, 1998)
    
    

    Storing Graphs in Memory

    In [44]:
    %%stata
    sysuse lifeexp.dta, clear
    
    generate  pos =  3
    replace   pos =  6 if country == "Honduras"
    replace   pos =  9 if country == "Canada"
    replace   pos =  9 if country == "Cuba"
    replace   pos =  9 if country == "Guatemala"
    replace   pos =  9 if country == "Panama"
    replace   pos =  9 if country == "Jamaica"
    replace   pos =  9 if country == "Dominican Republic"
    replace   pos =  9 if country == "Ecuador"
    replace   pos =  9 if country == "El Salvador"
    replace   pos = 12 if country == "Puerto Rico"
    
    twoway ///
     (scatter lexp safewater if region == 2, ///
      mcolor(black) msize(small) ///
      mlabel(country) mlabvposition(pos) mlabcolor(black)) ///
     (lfit lexp safewater if region == 2, clcolor(black)) ///
    ,name(north_america, replace) ///
     subtitle("North America", color(black)) ///
     ylabel(,angle(0)) ///
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(off)
    
    (Life expectancy, 1998)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    (1 real change made)
    
    

    Storing Graphs in Memory *** First create named graph above

    In [45]:
    %%stata
    replace  pos = 9 if country == "Venezuela"
    replace  pos = 9 if country == "Argentina"
    replace  pos = 9 if country == "Ecuador"
    
    twoway ///
     (scatter lexp safewater if region == 3, ///
      mcolor(black) msize(small)  ///
      mlabel(country) mlabvposition(pos) mlabcolor(black)) ///
     (lfit lexp safewater if region == 3, clcolor(black)) ///
    ,name(south_america, replace) ///
     subtitle("South America", color(black)) ///
     ylabel(, angle(0))  ////
     ytitle("Life expectancy at birth") ///
     xtitle("Percent of population with access to safe water") ///
     legend(off)
    
    (1 real change made)
    
    (1 real change made)
    
    (0 real changes made)
    
    

    Combining Graphs *** Requires named graphs created above

    In [46]:
    %%stata
    
    graph combine north_america south_america ///
    ,title("Life expectancy by access to safe water", color(black)) col(1)
    

    Combining Graphs *** Requires named graphs created

    In [47]:
    %%stata
    
    graph combine north_america south_america ///
    ,title("Life expectancy by access to safe water", ///
     color(black)) ///
     xcommon ycommon ///
     xsize(7) ysize(10.5) ///
     col(1)
    
    ## THE GRAPH BELOW LOOKS ALMOST THE SAME AS THE GRAPH ABOVE, BUT IF YOU PLOT IT IN STATA, IT WILL LOOK MUCH MORE BEAUTIFUL 
    
    Unknown #command
    
    

    Different standard colors available in Stata

    In [48]:
    %%stata
    #after installing "net install vgsg" type:
    vgcolormap, quietly
    #Change marker color in graphs by using the option mcolor()
    
    Unknown #command
    
    command vgcolormap is unrecognized
    r(199);
    
    Unknown #command